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The Epstein-Barr virus (EBV) infects more than 90% of the human population, and is the cause 
of several both serious and mild diseases. It is a tumorivirus, and has been widely studied as a model 
system for gene (de) regulation in human. A central feature of the EBV life cycle is its ability to 
persist in human B cells in states denoted latency I, II and III. In latency III the host cell is driven to 
cell proliferation and hence expansion of the viral population, but does not enter the lytic pathway, 
and no new virions are produced, while the latency I state is almost completely dormant. In this 
paper we study a physico-chemical model of the switch between latency I and latency III in EBV. 
We show that the unusually large number of binding sites of two competing transcription factors, 
one viral and one from the host, serves to make the switch sharper (higher Hill coefficient), either 
by cooperative binding between molecules of the same species when they bind, or by competition 
between the two species if there is sufficient steric hindrance. 



PACS numbers: 87.16.Yc,87.17.Aa,05.90.+m 

I. INTRODUCTION 

Genetic switches, mainly in bacteria, have recently in- 
terested statistical physicists, and work in this direction 
has been extensively reviewed in [H, 0] . The fundamental 
assumption is that gene transcription, the copying of a 
stretch of DNA into mRNA, is either "on" or "off" . This 
state of transcription depend on whether certain gene 
specific DNA binding proteins, transcription factors, are 
bound, or not, to the promoter region of the gene. A gene 
may be controlled by one or more transcription factors, 
each having a varying number of binding sites in the pro- 
moter region. The action of the transcription factor may 
in turn be either inhibitory or excitatory. Inhibition can 
arise from blocking access of the RNA-Polymerase to the 
transcription start site, while a stimulating effect is ob- 
tained if the bound factor stabilizes the Polymerase-DNA 
complex. A paradigmatic example where both effects are 
present is lysogeny maintenance in phage A [1,0]. DNA 
looping, where distantly bound transcription factors in- 
teract and affect transcription, is also possible. 

At a given transcription factor concentration, each pos- 
sible state of promoter bound factors occurs with a prob- 
ability given by a grand canonical ensemble formula. The 
promoter region with the binding sites (with or without 
transcription factors) corresponds to the small system, 
and the cytoplasm, with a large number of transcription 
factors moving around, serves as the reservoir. Quite 
often transcription factors bind in dimer (or multimcr) 
form, in which case the relevant concentration is deter- 
mined by balance from the total concentration. In sum- 
mary, the rate of transcription is a non-linear, sometimes 
quite complicated, function of the concentrations of the 
transcription factors regulating the gene. 

One important property in gene regulation is coopera- 



tivity. If a single copy of a protein molecule in monomer 
form were to (positively) regulate a certain gene, the ac- 
tivity of that gene would follow the well-known Michaclis- 
Menten curve. The transcription rate would then be pro- 
portional to the concentration of the regulating molecule, 
up to a threshold above which it would level off. In 
other words, there would be appreciably high transcrip- 
tion even at very low concentrations of the regulating 
protein. The rationale for transcription factor often bind- 
ing in multimer form, and of multiple DNA binding sites 
enabling cooperative interactions, is therefore assumed 
to be that it results in a sharper, more "all-or-nothing" 
switch. 

Multiple binding sites for one and the same transcrip- 
tion factor are common in eukaryotic promoters. The 
object of this paper is one particular viral example of no 
less than 20 binding sites for a viral factor, where tran- 
scriptional activity has been observed to require 8 bound 
molecules [f| Q , see section |TT] below. In addition, these 
sites are interleaved with an equal number of binding sites 
of a host transcription factor, presumably imposing the 
opposite effect. In a previous contribution, Q, we intro- 
duced, for reasons of computational simplicity, a thermo- 
dynamic model of this promoter switch ignoring eventual 
cooperative bindings and allowing some steric hindrance. 
Although direct experimental evidence is lacking, coop- 
erative bindings of the viral transcription factor at this 
promoter is likely to be present, as well as more exten- 
sive blocking scenarios due the closely spaced sites. Both 
these mechanisms are likely to affect the sharpness of the 
switch. 

We show in this paper that while cooperative protein 
interactions is one way to achieve effective cooperativity 
of the switch, accounting for full steric hindrance (block- 
ing) of one species of molecules on the other is a more 
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effective one. Therefore, a possible functional role of the 
alternating pattern of binding sites could be increasing 
effective cooperativity when the promoter architecture do 
not allow for cooperative molecular interactions. 

The paper is organized as follows: in section [Tl] we 
describe our example, and in section IIIII we describe our 
model of cooperativity and competition in this example. 
In section HVl we summarize and discuss our results. 



II. THE EPSTEIN-BARR VIRUS, THE EBNA-1 
PROTEIN, AND THE C PROMOTER 

The Epstein-Barr virus (EBV) belongs to the gamma- 
herpes virus family, with relatives among other primate 
lymphocryptoviruses, and has likely co-evolved with man 
for a very long time [8[. Although not discovered until 
the 1960ies, it is now known to infect more than 90 % of 
the human population. The infection is asymptomatic if 
it occurs early in life, while later infection may result in 
infectious mononucleosis, more commonly known as "the 
kissing disease" . The virus infects new hosts by virus 
particles shed from epithelial cells in the throat, and can 
persist in the host blood B cells for long times, in at 
least three distinct latent states known as latency I, II 
and III. EBV is medically important primarily because 
some cancer forms are invariably associated with the viral 
infection Q ■ 

The most vital EBV protein is EBNA-1, a transcrip- 
tion factor involved in replication, episome partitioning 
as well as gene regulation In latency I, EBNA-1 is 

produced from RNA transcripts originating from the Q 
promoter on the EBV genome. EBNA-1 down-regulates 
transcription from Qp by binding to sites downstream 
of the transcription start site [ll[. In latency III, on the 
other hand, EBNA-1 is produced together with five other 
proteins by alternative splicing of a longer RNA tran- 
scribed from the EBV C promoter (Cp) QJ]. EBNA-1 
positively regulates Cp activity by binding to the "family- 
of-repeats" (FR) region, positioned upstream of the start 
site [13[ . The physical description of this regulatory ele- 
ment is the topic of the present paper. 

The FR region consists of 20 consecutive binding sites 



for EBNA-1 [14] . There are minor variations in the DNA 



sequence among these sites, but they are all experimen- 
tally verified, and approximately equally strong, binding 
sites [l5[ . Comparing promoter activity, from constructs 
with varying number of binding sites in FR, revealed that 
at least eight sites are necessary to have full transcrip- 
tional activation see TableHV] Recent studies have 
identified an equal number of octamer binding sites at 
FR, juxtaposed with the EBNA-1 sites [lj|. The ac- 
tion of the human transcription factor Oct-2 , identified 
as binding to these octamer sites complex with the co- 
factors Groucho/TLE, is believed to be inhibitory (l7| . 

In summary, the Cp activity is largely regulated by 
binding of two species of molecules, EBNA-1 and Oct-2. 
They each can bind to 20 sites, and have antagonistic 



effects when bound. Due to the closely spaced binding 
sites, Oct-2 and EBNA-1 compete for binding to FR. It 
is however not experimentally known if one bound Oct- 
2 blocks out one or both of the neighbouring sites for 
EBNA-1, and vice versa. The other unknown aspect is 
whether there exists cooperative binding between EBNA- 
1 proteins at FR, and if so, the strength of these interac- 
tions [l8[ . Therefor we explore the effects of cooperative 
binding and blocking, with emphasis on how the effective 
cooperativity of the promoter switch is affected, i.e. the 
sharpness of the switch. 



III. COOPERATIVE BINDING AND 
COMPETITION 

The general thermodynamic framework is the follow- 
ing. Suppose a number of transcription factors TF±, 
TF2,. ■ ■ , TF m can bind in different states indexed by s 
around the start of a gene. The number of transcription 
factors of type TFi bound in state s is rii(s), the asso- 
ciation free energy is AG S , and the rate of transcription 
of the gene is R s . Suppose further [TFi] is the concen- 
tration of transcription factor TFi in the surrounding 
cytoplasm, in the form in which this transcription factor 
binds. Then the binding sites, with or without bound 
transcription factors, can be considered a small system, 
exchanging particles (transcription factors) and energy 
with the larger reservoir. The probability of the small 
system being in state s is 



P s <x [TFi] 



m(s) 



AG S 



[TF m ] nm{s > exp( — ) (1) 



and the net average rate of transcription is 

R([TF 1 ],...,[TF m ]) = J2 R sPs (2) 

s 

The key assumption behind @ is that the time scale 
at which the probabilities in ^ equilibrate is much 
faster than the time scales at which the the concentra- 
tions [TF-,], [TF 2 ],. . . , [TF m ] change appreciably. 

In the present example, states can be labeled by 
n, the number of EBNA-1 molecules bound, k, the 
number of Oct-2 molecules bound, m, the number of 
cooperative bindings between bound EBNA-1 molecules 
and k\, the number of cooperative bindings between 
bound Oct-2 molecules. Every such state has a binding 
free energy of 

AGn^.mM = nE E + kE + niE E1 + kiE i (3) 

where E E = -15.45 kcal/mol and E = -12.28 
kcal/mol (l9j are the known binding free energies of 
EBNA-1 and Oct-2 to binding sites in FR, and E E1 and 
Eqi are the unknown cooperative binding energies. In 
the numerical experiments described in this paper we 
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only examine EBNA-1 cooperativity. Eei is proportional 
to Ee in the range from % (no cooperativity) up to 40 
%. The total probability of the states with given values 
of n, k, n\ and k\ is hence 



P„, fc ,„ 1 , fcl oce(n,fc I n ll fe 1 )[£]"[0] fe ex P \~ ^ J 

(4) 

where £(n, k, m, k±) is the number such states, and the 
overall rate of transcription is 

N N-n n—1 N—n—1 

^EEE E (5) 

n=8 fc=0 ni=0 fei=0 

where N is the number of binding sites. 

As described briefly in the introduction, one can imag- 
ine two plausible blocking scenarios at FR. The first and 
simplest, is that each molecule bound hinders binding of 
the competitive species to the closest neighbouring site on 
one side. This is referred to a single-side blocking (Fig 
QJi). The other scenario is that each bound molecule, 
sterically hinders both neighbouring sites for the other 
molecule; a double-sided blocking (Fig[l})). The blocking 
method naturally affects the number of possible bound 
configurations, seen in Eq. O The upper bound in the 
sum over k is N — n in the single-side blocking model, 
but at most N — n—1 in the double-side blocking model 
for all n greater than zero. Similarly, the sums over ri\ 
and k\ may effectively go over smaller ranges e.g. in the 
double blocking scenario with both molecules bound and 
n + k = N - 1 all EBNA-1 and Oct-2 molecules bind 
together in two groups, hence n\ = n—1 and k\ = k—1. 
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FIG. 1: Illustration of the two blocking scenarios. EBNA-1 
(E) bounds the black sites while Oct-2 (O) bound the blue 
sites, a) The single-side blocking model where a bound E 
blocks the closest O site to the right, and a bound O blocks E 
binding to the closest site on the left. The 40 binding sites are 
represented as 20 sites, where each site can be bound by E or 
O. b) The double-side blocking model where one bound E or 
O blocks the opposite molecular species binding on both sites. 
The 20 sites can be bound by E or O, with the restriction that 
there has to be an empty site between any bound O with a 
bound E on the right. 



Brute-force counting of £(n, k, n\, ki) is not feasible as 
the number of states in this model is up to 3 20 s» 3.4- 10 9 
(in the model with single-side blocking only). Efficient 
calculation of £(n., k, n\, ki) involves two aspects. First, 
elementary combinatorics is used to build up a paradigm 
"balls-baskets" problem. It counts, under different con- 
strains, the number of ways that one can put certain 
number of balls into another number of baskets. Sec- 
ond, we find a way that can describe efficiently all effects 
including double-side blocking, cooperativity and combi- 
nation of both in a three-step algorithm: 

1. Construct a backbone sequence (SO) made up by 
two types of baskets (&e, bo), the two types of 
molecules. 

2. Distribute n Es and k Os among these baskets, 
forming a sequence (SI) consisting only of E and 
O. 

3. Consider the front, end and the n+k—1 in-between 
positions of SI as baskets (6^) for empty binding 
sites 4>. Insert N — (n + k) empty sites into these 
positions and get the final pattern (S2). 

By setting N = 20, the actual number of sites is re- 
duced by half, and the single-sided blocking model is the 
default. The double-side blocking is realized by setting 
the &0 between an " OE" segment in S2 as must-be-filled 
baskets (Fig lb). The number of cooperative units, n\, 
are counted by recording number of "EE" in S2, minus 
the number of 6^ that have been filled with <j). 

To examine the effective cooperativity in the tran- 
sition from P w to P w 1 we compute the Hill 
coefficient. This is the logarithmic derivative of the ratio 
of probability of transcription to the probability of no 
transcription, with respect to the logarithm of the free 
ligand concentration. The Hill coefficient is a function of 
the ligand concentration, but the effective Hill coefficient 
is customarily taken at half saturation: 

- f 1 ~ p 1 at P = 0.5 (6) 
d\g[Ef ree \ 

In this paper we explore the Hill coefficient functions 
to see how blocking and cooperative binding influence 
the effective cooperativity of the switch. There are three 
cases studied; 1) cooperative binding of EBNA-1 and no 
competing molecular species, 2) cooperative binding of 
EBNA-1 with single-side blocking between the compet- 
ing species, and 3) cooperative binding of EBNA-1 with 
double-side blocking between the competing species. 

IV. EFFECTIVE COOPERATIVITY OF THE 
SWITCH 

One convenient way to visualize the cooperativity of 
the switch is as the ratio j^p vs. the local Hill coef- 



4 



— N-10 
-N=15 
N-20 



three models. Without any cooperative interactions, 
and without competition, the effective Hill coefficient is 
substantially lower than both its limits. This baseline 
function for the system has an effective Hill coefficient 
of 3.5 (Fig [3j circled lines). This low Hill coefficient 
remains even with competition from Oct-2 binding, 
for the single-side blocking, the effective cooperativity 
practically insensitive to Oct-2 levels. On the contrary, 
competition with double-side blocking dramatically 
alters the shape of the Hill coefficient curve, to a 
sigmoidal interpolation between the limits 8 and N — 7. 
The effective Hill coefficient then changes from 3.5 up 
to 10.5, for saturating amounts of Oct-2 (Fig[3l dotted 
lines) . 



FIG. 2: Hill coefficient curves for three different numbers of 
binding sites in the system; 10, 15 or 20. For each case, the 
limit at low EBNA-1 concentrations (low P) is at Hill coeffi- 
cient 8, since this is the definition of when the transcription 
is on at minimum. For the upper limit, at high P, the Hill 
coefficient approaches N — 7. 
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From a theoretical point of view, the thermodynamic 
model of the switch is a (finite, one-dimensional) Ising- 
like model with three states at each site: bound by 
EBNA-1, bound by Oct-2, or free. The only complica- 
tion in computing the "ON" probability (P) is that only 
states with enough bound EBNA-1 count, which mixes in 
a global variable in the elementary statistical mechanical 
model. The single-blocking results can however be read- 
ily understood. With no cooperative binding and only 
single blocking, one can sum over k in ([7]) to obtain the 
model studied in [7|, that is 



' N\ E E ( E o 

P n oc ( J [E] n e" n ^T f 1 + [OJe^^r 



Including the normalization this means 



N—n 



(7) 



P„ 



N 



n J (1 + z) 



[E]e k B T 



N 



Hill coefficient 



1 + [0]e fc s T 



(8) 



FIG. 3: Hill coefficient curves for both single- and double-side 
blocking case. The circles correspond to the single-side block- 
ing, where the Oct-2 concentration do not at all affect the ef- 
fective cooperativity. On the other hand, in the double-sided 
blocking model, the Oct concentration dramatically alters the 
cooperativity of the switch. For saturating levels of Oct-2, the 
effective Hill coefficient approaches 10.5 (dotted black line). 



d lg -P 

ficient given as dlg j g 1 ~ F j ■ For very high and very low 
concentrations of EBNA-1, corresponding to very large 
and very small values of P, it is easy to sec that in our 
model 1 - P ~ A[E free ] 7 ~ N respectively P ~ B[E free } 8 . 
A and B are constants, and N is the total number of 
binding sites in FR. Accordingly, the extreme local Hill 
coefficients are N — 7 and 8. Fig. [2] illustrates this limit 
behaviour for three values of N. 

In the region of main interest, where P ~ 5, the Hill 
coefficient curves show very different behavior for the 



and the ratio between ON and OFF probabilities is 
therefore a function of the variable z only: 



P 



2^m=8 In J 

The local Hill coefficients are 



1 - P 



(9) 



dlg[E fre 



d\gf(z) 
d\gz 



(10) 



which like the ratio j^p depends on the concentration 
of the second molecule [O] only through z. The effective 
cooperativity in the model without cooperative binding 
and only single blocking hence does not depend on [O], 
as shown in the curves in Fig[3J The Hill coefficient at 
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| can be estimated by approximating the binomials 



with a Gaussian distribution, 



P 



a ' I exp 

/ X + CC7 



x) + x log z I dx (11) 



where, in the case at hand, C 



N 
2 



10, cr 



f = V5 and c = -f . Half-filling 

is achiev ed at z* = e c / CT , and the Hill coefficient is 
^/8fj 2 /7r PS 3.6 which accords quite well with the 
minimum value in Fig. [3] The switch is therefore much 
less sharp than the limits of 8 and N — 7 = 13, at 
respectively P « and P « 1 could have led one to 
believe. We note that the sharpness increases with N 
(as long as the threshold stays around N/2), but only as 
the square root of N: more than a hundred consecutive 
binding sites are necessary to reach a Hill coefficient of 
about ten in a model of this kind. 

In the model with double blocking on the other 
hand clearly the effective cooperativity can be much 
larger, and also depend on [O]. That is easy to un- 
derstand in the limit where [O] is large; if so EBNA-1 
and Oct-2 compete for binding sites, and the possibility 
that a site is left free can be disregarded. Therefore, 
if n copies of EBNA-1 are bound, then also N — n 
copies of Oct-2 are bound, altogether in the pattern 
EEEE ■ ■ ■ OOOO with statistical weight 



Pn = 



[E]e 



1 + x + x 2 + 



„N 



[0]t 



Bp 



(12) 



The Hill coefficient is then only a function of x, such 
that the curve in Fig [3] has a limit when [O] becomes 
large, and the value of the Hill coefficient at e.g. x = 1 
then lies between the limits of 8 and 13. Competition 
with a second molecule therefore makes the switch 
sharper for double-side blocking, in contrast to the 
situation in single-sided blocking. 

The case with cooperativity can be understood qualita- 
tively, with the helix-coil model of protein physics. With- 
out Oct-2, the statistical model can be written as a fac- 

E e +E C 

tor h = [E]e ^ 



k B T for each letter E, and a penalty 

c = e k B T for every start letter of a string of E's. In 
an infinitely long string, the fraction of letters E as well 
as the frequency of initiation of a string of E's are cal- 
culated from the leading eigenvalue of the transfer ma- 
trix [20l | . In our case, the interesting region is obviously 
when that fraction is around 40%, as 8 sites out of 20 
need to be filled to have transcription from Cp. If c is 
close to one, cooperative binding is weak, and the switch 
is similar to the single-blocking case discussed above. If 
on the other hand c is much less than one, the expected 
fraction of letters E can be larger than 40%, while the ex- 
pected frequency if initialization of a string of E's is less 



than once in twenty sites. Eventually, we would expect 
that either all twenty sites are bound, or no sites in FR 
be bound. This describes a situation where all twenty 
molecules have to bind in simultaneously, in which case 
the Hill coefficient is 20. 

The addition of a cooperative binding of EBNA-1 to 
both the single- and double sided model, hence changes 
the effective Hill coefficient differently, depending on 
model. Fig Q] displays the curves for 5 different coop- 
erative binding strengths, when no Oct-2 is competing 
for the FR sites. The range of cooperative strength here 
is from % up to 40 % of the DNA affinity, i.e w 6.2 
kcal/mol, where the effective Hill coefficient is increased 
from 3.5 to 16. 
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FIG. 4: Hill coefficient curves for the model without any 
competitive molecular species (Oct-2) but with increasing 
strength of cooperative binding between EBNA-1. The co- 
operative binding is varied from 0-40 % of the DNA bind- 
ing strength of EBNA-1. This range corresponds to 0-6.2 
kcal/mol in binding energies. With no competition and only 
added cooperative interaction, the effective Hill coefficient 
changes dramatically from 3.5 up to 16 for the 6.2 kcal/mol 
cooperative binding energy. 

However, in the real system the competitive protein 
Oct-2 is likely to be present, perhaps even at very high 
concentrations. As for the single-sided blocking, an 
additional cooperative binding of EBNA-1 does not have 
the same impact when Oct-2 levels are high. Instead 
of a 4-fold change, from 3.4 to 16, the effective Hill 
coefficient is now only doubled, from 3.5 to 7 (compare 
Fig|4]and[5l solid lines). This is to be compared with the 
double-sided blocking model, where even no cooperative 
bindings have a relatively high effective cooperativity. 
Adding up to 40 % cooperative binding strength, the 
Hill coefficient is almost doubled, from 10.5 to 18 (Fig[3|) 

A conclusion to draw from this is that to create an 
effective switch for genetic control, this type of architec- 
ture, with alternating binding sites for two antagonistic 
factors, can be one approach. For EBV, the FR region 
is known for its enhancer function, as well as forming a 
looped structure with another EBNA-1 binding reg ion on 
the viral genome; the dyad symmetry (DS) [2lll22j ]. This 
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structure is involved in replication initiation control. If 
the EBNA-1 binding sites in FR were to be arrange in 
the same manner as in DS, i.e. much closer in space, 
there might be cooperative bindings forming even at FR. 
However, since FR also seem to play an important role in 
forming a looped structure, there might be a structural 
reason behind these more sparsely placed sites, not en- 
abling the same type of tight interactions. And, as we 
show here, there is no need for cooperative interactions 
to get a sharp switch of Cp activity, as long as there is 
efficient steric hindrance. 
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FIG. 6: Hill coefficient curves for the model with high con- 
centration of the competitive molecular species (Oct), double- 
side blocking and different strength of cooperative binding be- 
tween EBNA-1. The cooperative binding is varied from 0-40 
% of the DNA binding strength of EBNA-1. Double blocking 
in itself, gives a high effective Hill coefficient, and the extra 
cooperative interactions almost doubles this coefficient up to 
18. 
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FIG. 5: Hill coefficient curves for the model with high con- 
centration of the competitive molecular species (Oct), single- 
side blocking and different strength of cooperative binding 
between EBNA-1. The cooperative binding is varied from 0- 
40 % of the DNA binding strength of EBNA-1. The effect of 
adding cooperative bindings for EBNA-1 only increases the 
effective Hill coefficient from 3.5 to 7. 
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Number of sites 


Activity 


20 


280 


19 


229 


17 


226 


14 


169 


12 


206 


11 


169 


8 


87 


6 


19 
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19 


4 


11 


3 


3.3 


2 


2.1 


1 


1.2 
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TABLE I: Activity of Cp promotor in EBV strains with dif- 
ferent numbers of binding sites for EBNA-1 in the family of 
repeats site, adapted after Activity level relative to con- 
trol. 



